POSTECH at NTCIR-5

نویسندگان

  • Seung-Hoon Na
  • In-Su Kang
  • Jong-Hyeok Lee
چکیده

This paper describes methodologies for NTCIR-5 CLIR involving Korean and Japanese, and reports the official result as well as retrieval results using NTCIR-3 and NTCIR-4 data. We participated in four tasks: K-K and J-J monolingual tracks and K-J and J-K cross-lingual tracks. Unlike English, in Asian languages such as Korean and Japanese term extraction is nontrivial because of segmentation ambiguities. In this regard, we prepared multiple term representations for documents and queries, of which ranked results are merged to generate final ranking. In preliminary experiments using NTCIR-3 and NTCIR-4 data, our model showed the best performances for description queries in Korean and Japanese. In offline results using NTCIR-5 data, our methodology in Korean showed the best performance by archieving 0.5680 for description queries and 0.6159 for others.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

POSTECH at NTCIR-5 Patent Retrieval: Smoothing Experiments in a Language Modeling Approach to Patent Retrieval

This report describes the experimental results of our participation at the Document Retrieval Subtask of NTCIR-5 Patent Retrieval Task. Unlike newspaper articles which belong to the main document type handled in previous information retrieval experiments, patent documents have many different characteristics in terms of length, technicality, structureness, etc. Among these, we focus on the lengt...

متن کامل

POSTECH at NTCIR-6 English Patent Retrieval Subtask

This paper reports our experimental results at the NTCIR-6 English Patent Retrieval Subtask. Our previous participation at the patent retrieval Subtask revealed that the long length of the patent applications require less smoothing of the document model than general documents such as news paper articles. We setup the initial baseline retrieval system for U.S. patent applications and compare the...

متن کامل

The POSTECH Statistical Machine Translation Systems for NTCIR-7 Patent Translation Task

This paper describes the POSTECH statistical machine translation (SMT) systems for the NTCIR-7 patent translation task. We entered two patent translation subtasks: Japanese-to-English (KLE-je), and English-toJapanese translation (KLE-ej). The baseline systems are derived from a common phrase-based SMT framework. In addition, for Japanese-to-English translation, we adopted two kinds of methods. ...

متن کامل

POSTECH Question-Answering Experiments at NTCIR4-QAC

This paper describes our system and additional experimental results in NTCIR-4 QAC Task 1. The main components of our system are question classification, passage retrieval, and named entity extraction. Passage retrieval was performed by a density-based ranking method based on importance of query terms occurred in the passage. Question classification and Named entity extraction were designed by ...

متن کامل

POSTECH at NTCIR-6: Combining Evidences of Multiple Term Extractions for Mono-lingual and Cross-lingual Retrieval in Korean and Japanese

This paper describes our methodologies for NTCIR-6 CLIR involving Korean and Japanese, and reports the official result for Stage 1 and Stage 2. We participated in three tracks: K-K and J-J monolingual tracks and J-K cross-lingual tracks. As in the previous year, we focus on handling segmentation ambiguities in Asian languages. As a result, we prepared multiple term representations for documents...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005